Búsqueda | Portal Regional de la BVS

A multi-omics data analysis workflow packaged as a FAIR Digital Object.

Niehues, Anna; de Visser, Casper; Hagenbeek, Fiona A; Kulkarni, Purva; Pool, René; Karu, Naama; Kindt, Alida S D; Singh, Gurnoor; Vermeiren, Robert R J M; Boomsma, Dorret I; van Dongen, Jenny; 't Hoen, Peter A C; van Gool, Alain J.

Gigascience ; 132024 Jan 02.

Artículo en Inglés | MEDLINE | ID: mdl-38217405

RESUMEN

BACKGROUND: Applying good data management and FAIR (Findable, Accessible, Interoperable, and Reusable) data principles in research projects can help disentangle knowledge discovery, study result reproducibility, and data reuse in future studies. Based on the concepts of the original FAIR principles for research data, FAIR principles for research software were recently proposed. FAIR Digital Objects enable discovery and reuse of Research Objects, including computational workflows for both humans and machines. Practical examples can help promote the adoption of FAIR practices for computational workflows in the research community. We developed a multi-omics data analysis workflow implementing FAIR practices to share it as a FAIR Digital Object. FINDINGS: We conducted a case study investigating shared patterns between multi-omics data and childhood externalizing behavior. The analysis workflow was implemented as a modular pipeline in the workflow manager Nextflow, including containers with software dependencies. We adhered to software development practices like version control, documentation, and licensing. Finally, the workflow was described with rich semantic metadata, packaged as a Research Object Crate, and shared via WorkflowHub. CONCLUSIONS: Along with the packaged multi-omics data analysis workflow, we share our experiences adopting various FAIR practices and creating a FAIR Digital Object. We hope our experiences can help other researchers who develop omics data analysis workflows to turn FAIR principles into practice.

Asunto(s)

Multiómica , Programas Informáticos , Humanos , Niño , Flujo de Trabajo , Reproducibilidad de los Resultados , Metadatos

FAIR Genomes metadata schema promoting Next Generation Sequencing data reuse in Dutch healthcare and research.

van der Velde, K Joeri; Singh, Gurnoor; Kaliyaperumal, Rajaram; Liao, XiaoFeng; de Ridder, Sander; Rebers, Susanne; Kerstens, Hindrik H D; de Andrade, Fernanda; van Reeuwijk, Jeroen; De Gruyter, Fini E; Hiltemann, Saskia; Ligtvoet, Maarten; Weiss, Marjan M; van Deutekom, Hanneke W M; Jansen, Anne M L; Stubbs, Andrew P; Vissers, Lisenka E L M; Laros, Jeroen F J; van Enckevort, Esther; Stemkens, Daphne; 't Hoen, Peter A C; Beliën, Jeroen A M; van Gijn, Mariëlle E; Swertz, Morris A.

Sci Data ; 9(1): 169, 2022 04 13.

Artículo en Inglés | MEDLINE | ID: mdl-35418585

RESUMEN

The genomes of thousands of individuals are profiled within Dutch healthcare and research each year. However, this valuable genomic data, associated clinical data and consent are captured in different ways and stored across many systems and organizations. This makes it difficult to discover rare disease patients, reuse data for personalized medicine and establish research cohorts based on specific parameters. FAIR Genomes aims to enable NGS data reuse by developing metadata standards for the data descriptions needed to FAIRify genomic data while also addressing ELSI issues. We developed a semantic schema of essential data elements harmonized with international FAIR initiatives. The FAIR Genomes schema v1.1 contains 110 elements in 9 modules. It reuses common ontologies such as NCIT, DUO and EDAM, only introducing new terms when necessary. The schema is represented by a YAML file that can be transformed into templates for data entry software (EDC) and programmatic interfaces (JSON, RDF) to ease genomic data sharing in research and healthcare. The schema, documentation and MOLGENIS reference implementation are available at https://fairgenomes.org .

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento , Metadatos , Atención a la Salud , Genómica , Humanos , Programas Informáticos

Extracting knowledge networks from plant scientific literature: potato tuber flesh color as an exemplary trait.

Singh, Gurnoor; Papoutsoglou, Evangelia A; Keijts-Lalleman, Frederique; Vencheva, Bilyana; Rice, Mark; Visser, Richard G F; Bachem, Christian W B; Finkers, Richard.

BMC Plant Biol ; 21(1): 198, 2021 Apr 24.

Artículo en Inglés | MEDLINE | ID: mdl-33894758

RESUMEN

BACKGROUND: Scientific literature carries a wealth of information crucial for research, but only a fraction of it is present as structured information in databases and therefore can be analyzed using traditional data analysis tools. Natural language processing (NLP) is often and successfully employed to support humans by distilling relevant information from large corpora of free text and structuring it in a way that lends itself to further computational analyses. For this pilot, we developed a pipeline that uses NLP on biological literature to produce knowledge networks. We focused on the flesh color of potato, a well-studied trait with known associations, and we investigated whether these knowledge networks can assist us in formulating new hypotheses on the underlying biological processes. RESULTS: We trained an NLP model based on a manually annotated corpus of 34 full-text potato articles, to recognize relevant biological entities and relationships between them in text (genes, proteins, metabolites and traits). This model detected the number of biological entities with a precision of 97.65% and a recall of 88.91% on the training set. We conducted a time series analysis on 4023 PubMed abstract of plant genetics-based articles which focus on 4 major Solanaceous crops (tomato, potato, eggplant and capsicum), to determine that the networks contained both previously known and contemporaneously unknown leads to subsequently discovered biological phenomena relating to flesh color. A novel time-based analysis of these networks indicates a connection between our trait and a candidate gene (zeaxanthin epoxidase) already two years prior to explicit statements of that connection in the literature. CONCLUSIONS: Our time-based analysis indicates that network-assisted hypothesis generation shows promise for knowledge discovery, data integration and hypothesis generation in scientific research.

Asunto(s)

Minería de Datos , Procesamiento de Lenguaje Natural , Tubérculos de la Planta/fisiología , Solanum tuberosum/fisiología , Color , Pigmentos Biológicos

Machine learning to further improve the decision which boar ejaculates to process into artificial insemination doses.

Kamphuis, Claudia; Duenk, Pascal; Veerkamp, Roel Franciscus; Visser, Bram; Singh, Gurnoor; Nigsch, Annette; De Mol, Rudi Maria; Broekhuijse, Marleen Leonarda Wilhelmina Johanna.

Theriogenology ; 144: 112-121, 2020 Mar 01.

Artículo en Inglés | MEDLINE | ID: mdl-31927416

RESUMEN

Current artificial insemination (AI) laboratory practices assess semen quality of each boar ejaculate to decide which ones to process into AI doses. This decision is aided with two, world-wide used, motility parameters that come available through computer assisted semen analysis (CASA). This decision process, however, still results in AI doses with variable and sometimes suboptimal fertility outcomes (e.g., small litter size). The hypothesis was that the decision which ejaculates to process into AI doses can be improved by adding more data from CASA systems, and data from other sources, in combination with a data-driven model. Available data consisted of ejaculates that passed the initial decision, and thus, were processed into AI doses and used to inseminate sows. Data were divided into a training set (6793 records) and a validation set (1191 records) for model development, and an independent test set (1434 records) for performance assessment. Gradient Boosting Machine (GBM) models were developed to predict four fertility phenotypes of interest (gestation length, total number born, number born alive, and number of stillborn piglets). Each fertility phenotype was considered as a numeric and as a binary outcome parameter, totaling to eight different fertility phenotypes. Data used to further improve the decision process originated from four sources: 1) CASA information, 2) boar ejaculate information, 3) breeding value estimations, and 4) weather information. These data were used to create seven prediction sets, where each new set added parameters to the ones included in the previous set. The GBM models predicted fertility phenotypes with low correlations (for numeric phenotypes) and area under the curve values (for binary phenotypes) on the test data. Hence, results demonstrated that a combination of more data and GBM did not enable further improvement of the AI dose quality checks, resulting in the rejection of our hypothesis. However, our study revealed parameters affecting boar ejaculate fertility which were not used in today's decision process. These parameters (listed in the top 10 in at least four GBM models) included one parameter associated with boar ejaculate information, two with breeding value estimations, five with CASA information, and one with weather information. These parameters, therefore, should be further investigated for their potential value when assessing the quality of boar ejaculates in daily routine AI doses processing.

Asunto(s)

Inseminación Artificial/veterinaria , Análisis de Semen/veterinaria , Preservación de Semen/veterinaria , Porcinos/fisiología , Animales , Área Bajo la Curva , Procesamiento de Imagen Asistido por Computador , Aprendizaje Automático , Masculino , Análisis de Semen/métodos , Preservación de Semen/métodos

Differential diagnosis and surgical management of cecal dilatation vis-a-vis cecal impaction in bovine.

Singh, Gurnoor; Udehiya, Rahul Kumar; Mohindroo, Jitender; Kumar, Ashwani; Singh, Tarunbir; Verma, Pallavi; Devi, Nameirakpam Umeshwori; Anand, Arun.

Vet World ; 11(9): 1244-1249, 2018 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-30410228

RESUMEN

AIM: The present study was undertaken to study the clinical and hemato-biochemical alterations, ultrasonography, and surgical treatment of bovine suffering from cecal dilatation and cecal impaction. MATERIALS AND METHODS: The present study was conducted on 11 bovines (9 buffaloes and 2 cattle) suffering from cecal dilatation (n=6) and cecal impaction (n=5). The diagnosis of surgical affections of cecum was made on the basis of clinical examination, hematobiochemistry, ultrasonography, and exploratory laparotomy. RESULTS: A marked decrease in serum total protein, albumin, chloride, potassium, and calcium levels while an increase in lactate concentrations was recorded. Peritoneal fluid examination revealed an increase in total protein concentration. Per rectal examination along with ultrasonography was used as a confirmatory diagnostic tool for cecal dilatation and cecal impaction. Ultrasonographic features of cecal dilatation and cecal impaction were recorded. Left flank laparorumenotomy was performed in six animals with dilated cecum along with colonic fecalith. Post-rumenotomy, these animals were treated with massage of cecum along with kneading of colonic fecalith. Right flank typhlotomy was done in the remaining five animals having impacted cecum for decompression of the dilated cecum. 9 of 11 animals survived which underwent surgery and remained healthy up to 3-month follow-up. CONCLUSION: Ultrasonography was reliable in the diagnosis of cecal dilatation and cecal impaction in bovine. Left flank exploration after laparorumenotomy is an ideal surgical technique for the management of cecal dilatation, while right flank typhlotomy is ideal for the management of cecal impaction in bovine.

QTLTableMiner⁺⁺: semantic mining of QTL tables in scientific articles.

Singh, Gurnoor; Kuzniar, Arnold; van Mulligen, Erik M; Gavai, Anand; Bachem, Christian W; Visser, Richard G F; Finkers, Richard.

BMC Bioinformatics ; 19(1): 183, 2018 05 25.

Artículo en Inglés | MEDLINE | ID: mdl-29801439

RESUMEN

BACKGROUND: A quantitative trait locus (QTL) is a genomic region that correlates with a phenotype. Most of the experimental information about QTL mapping studies is described in tables of scientific publications. Traditional text mining techniques aim to extract information from unstructured text rather than from tables. We present QTLTableMiner++ (QTM), a table mining tool that extracts and semantically annotates QTL information buried in (heterogeneous) tables of plant science literature. QTM is a command line tool written in the Java programming language. This tool takes scientific articles from the Europe PMC repository as input, extracts QTL tables using keyword matching and ontology-based concept identification. The tables are further normalized using rules derived from table properties such as captions, column headers and table footers. Furthermore, table columns are classified into three categories namely column descriptors, properties and values based on column headers and data types of cell entries. Abbreviations found in the tables are expanded using the Schwartz and Hearst algorithm. Finally, the content of QTL tables is semantically enriched with domain-specific ontologies (e.g. Crop Ontology, Plant Ontology and Trait Ontology) using the Apache Solr search platform and the results are stored in a relational database and a text file. RESULTS: The performance of the QTM tool was assessed by precision and recall based on the information retrieved from two manually annotated corpora of open access articles, i.e. QTL mapping studies in tomato (Solanum lycopersicum) and in potato (S. tuberosum). In summary, QTM detected QTL statements in tomato with 74.53% precision and 92.56% recall and in potato with 82.82% precision and 98.94% recall. CONCLUSION: QTM is a unique tool that aids in providing QTL information in machine-readable and semantically interoperable formats.

Asunto(s)

Minería de Datos/métodos , Sitios de Carácter Cuantitativo , Programas Informáticos , Algoritmos , Gráficos por Computador , Bases de Datos Factuales , Solanum lycopersicum/genética , Publicaciones , Semántica , Solanum tuberosum/genética

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA